Support Virtual-GenAI monitoring by peachisai · Pull Request #13745 · apache/skywalking

peachisai · 2026-03-16T13:53:17Z

If this pull request closes/resolves/fixes an existing issue, replace the issue number. Closes #.
Update the CHANGES log.

wu-sheng · 2026-03-16T13:55:54Z

Could you show the screenshow at least 3-4 mins data? One dot in only one minute is not very clear.

wu-sheng

Review: Support Virtual-GenAI monitoring

Critical Issues

1. Config file exclude mismatch in server-starter/pom.xml
Exclude says gen-ai-settings.yml but actual file is gen-ai-config.yml.

2. requiredModules() returns empty array
GenAIAnalyzerModuleProvider uses CoreModule in start() but doesn't declare it in requiredModules(). Should return new String[] { CoreModule.NAME }.

3. Module naming convention violation
Existing analyzer modules use lowercase-hyphenated: agent-analyzer, log-analyzer, meter-analyzer. New module genAI-analyzer should be gen-ai-analyzer.

4. Package should be org.apache.skywalking.oap.server.analyzer.genai
Currently uses org.apache.skywalking.oap.meter.analyzer.* which collides with the existing meter-analyzer module's package. Since this is a trace span analyzer (shared across SkyWalking/OTEL/Zipkin receivers), the package should be org.apache.skywalking.oap.server.analyzer.genai.*.

Design Issues

5. Duplicate OAL metric
gen_ai_provider_resp_time and gen_ai_provider_latency_avg both compute from(GenAIProviderAccess.latency).longAvg(). Remove one.

6. totalCost semantics are confusing
Stored value is tokens * costPerM, dashboard divides by 1,000,000. Better to store actual cost by dividing at computation time.

7. Missing NamingControl in VirtualGenAIProcessor
Other virtual processors all use NamingControl to normalize service names. GenAI processor skips this.

8. Tag key inconsistency: gen_ai.stream.ttfr vs timeToFirstToken
Tag says "ttfr", field says "timeToFirstToken", doc doesn't mention this tag at all.

Code Quality Issues

9. GenAIConfigLoader constructor ignores Yaml parameter
Accepts Yaml but creates a new one in loadConfig().

10. fastjson dependency in e2e test
No new dependency version should be added directly in sub-module pom.xml.
Dependencies are managed by BOM. We have decided not to include this repo as it had a lot of critical CVEs before. We have to fix those(re-release patch version), it is too pain.

11. E2E Dockerfile clones unpinned external repo
Dockerfile.provider clones spring-projects/spring-ai-examples without pinning a commit/tag. Any upstream change could break the e2e test.

12. Documentation typo
virtual-genai.md: "Virtual cache represent the Generative AI service nodes" - copy-paste from virtual-cache doc.

Minor Issues

13. Missing newline at end of file in multiple files: gen-ai-config.yml, menu.yaml, SPI files, e2e expected YAMLs, dashboard JSONs.

14. GenAIModelAccessDispatcher bypasses normal dispatch flow - directly calls MetricsStreamProcessor.getInstance().in(traffic).

15. VirtualGenAIProcessor.recordList should be final.

16. Blank line in import block in VirtualServiceAnalysisListener.java between java.util and lombok imports.

wu-sheng

Additional issue: should use percentile2 instead of percentile

All production OAL files use percentile2(10). The old percentile function only exists in e2e test OAL for backward-compatibility testing.

In virtual-gen-ai.oal, the following lines should use percentile2:

gen_ai_provider_latency_percentile = from(GenAIProviderAccess.latency).percentile2(10);
gen_ai_model_latency_percentile = from(GenAIModelAccess.latency).percentile2(10);
gen_ai_model_ttft_percentile = from(GenAIModelAccess.timeToFirstToken).filter(timeToFirstToken > 0).percentile2(10);

And your UI doesn't show the correct percentile labels.

peachisai · 2026-03-16T15:33:15Z

@wu-sheng
hi
Regarding point 6: totalCost semantics
Previously I found, if the cost of a single model call is extremely small—specifically when it is less than 0.001, and storing it as a direct decimal may result in the value being rounded down to 0 which is stored in databases finally. This would lead to significant inaccuracies when calculating the sum for aggregate reports.

wu-sheng · 2026-03-17T07:12:07Z

UI side got merged. When you update this PR, please include the submodule update.

peachisai · 2026-03-18T15:03:27Z

not yet finish, some check fails in my local env, still fixing

.github/workflows/skywalking.yaml

wu-sheng · 2026-03-19T00:07:34Z

docs/en/setup/service-agent/virtual-genai.md

@@ -0,0 +1,16 @@
+# Virtual GenAI


You need to update the demo to point to here. I think from Marketplace/General Service?

Not just this. menu.yml is not updated in the /docs/en

wu-sheng · 2026-03-19T00:07:57Z

e2e fails, please fix it.

peachisai · 2026-03-20T04:31:04Z

@wu-sheng
hi, most of above issues had fixed. but for point 14, I am not very following.

…ng' of github.com:peachisai/skywalking into Support-GenAI-monitoring

peachisai · 2026-03-20T04:42:59Z

ui submodule cannot push successfully, always loading . will try again.

...c/main/java/org/apache/skywalking/oap/analyzer/genai/service/GenAIModelAccessDispatcher.java

…lking into Support-GenAI-monitoring

peachisai · 2026-03-20T13:01:34Z

ui submodule had updated

wu-sheng · 2026-03-22T06:32:33Z

You have build errors.

peachisai · 2026-03-22T07:51:20Z

You have build errors.

works on my side. could u retrigger the workflow again? it should be an occasional issue.

oap-server/server-starter/src/main/resources/gen-ai-config.yml

wu-sheng · 2026-03-23T15:14:44Z

...lyzer/src/main/java/org/apache/skywalking/oap/analyzer/genai/service/GenAIMeterAnalyzer.java

+    }
+
+    @Override
+    public GenAIMetrics extractMetricsFromSWSpan(SpanObject span, SegmentObject segment) {


Let's prepare a UT for this. As I noticed, the config file seems to have something missed, we should verify it is processed correctly.

This UT should include loading files, matching rules, and estimated cost.

...lyzer/src/main/java/org/apache/skywalking/oap/analyzer/genai/service/GenAIMeterAnalyzer.java

...n-ai-analyzer/src/main/java/org/apache/skywalking/oap/analyzer/genai/config/GenAITagKey.java

wu-sheng · 2026-03-24T12:11:30Z

Error: src/test/java/org/apache/skywalking/oap/server/starter/config/GenAIMeterAnalyzerTest.java:[108] (whitespace) EmptyLineSeparator: There is more than 1 empty line one after another.
Error: src/test/java/org/apache/skywalking/oap/server/starter/config/GenAIMeterAnalyzerTest.java:[154] (regexp) RegexpSingleline: Not allow chinese character !

Code style should be fixed.

oap-server/server-starter/src/main/resources/oal/virtual-gen-ai.oal

...er/src/test/java/org/apache/skywalking/oap/server/starter/config/GenAIMeterAnalyzerTest.java

...lking/oap/server/analyzer/provider/trace/parser/listener/vservice/VirtualGenAIProcessor.java

...lyzer/src/main/java/org/apache/skywalking/oap/analyzer/genai/service/GenAIMeterAnalyzer.java

fix

wu-sheng · 2026-03-24T23:44:26Z

oap-server/server-starter/src/main/resources/oal/virtual-gen-ai.oal

+gen_ai_model_total_estimated_cost = from(GenAIModelAccess.totalEstimatedCost).sum();
+gen_ai_model_avg_estimated_cost = from(GenAIModelAccess.totalEstimatedCost).doubleAvg();


doubleAvg() on a long field

gen_ai_provider_avg_estimated_cost = from(GenAIProviderAccess.totalEstimatedCost).doubleAvg(); gen_ai_model_avg_estimated_cost = from(GenAIModelAccess.totalEstimatedCost).doubleAvg();

totalEstimatedCost is long in the source classes, but doubleAvg() is designed for double inputs. Verify this compiles and works correctly at OAL code generation time — if the
generated code expects getXxx() returning double but gets long, there may be a type mismatch. Consider either changing the field to double or using longAvg() and adjusting the
dashboard expression.

wu-sheng · 2026-03-24T23:47:15Z

...c/main/java/org/apache/skywalking/oap/analyzer/genai/matcher/GenAIProviderPrefixMatcher.java

+        String providerName;
+    }
+
+    public static class MatchResult {


Suggested change

public static class MatchResult {

@Data

public static class MatchResult {

Use Lombok?

wu-sheng · 2026-03-24T23:48:01Z

...c/main/java/org/apache/skywalking/oap/analyzer/genai/matcher/GenAIProviderPrefixMatcher.java

+        final Map<Character, TrieNode> children = new HashMap<>();
+        String providerName;


Private fields? and with @Data?

wu-sheng · 2026-03-24T23:49:07Z

...er/src/test/java/org/apache/skywalking/oap/server/starter/config/GenAIMeterAnalyzerTest.java

+ *
+ */
+
+package org.apache.skywalking.oap.server.starter.config;


Let's move this into Analyzer, and you could copy gen-ai-config.yml file into test in that module as well.

wu-sheng · 2026-03-24T23:55:15Z

oap-server/server-starter/src/main/resources/gen-ai-config.yml

+  - provider: groq
+    prefix-match:
+      - llama


We should not use all llama, as a OSS model, it could be used by anyone. We could simply remove this part.

Uat

wu-sheng

LGTM. Thanks for making this all fixed.

Support Virtual-GenAI monitoring

e29c3a9

wu-sheng added the backend OAP backend related. label Mar 16, 2026

wu-sheng added this to the 10.4.0 milestone Mar 16, 2026

fix changes

3642ce3

wu-sheng requested changes Mar 16, 2026

View reviewed changes

wu-sheng reviewed Mar 16, 2026

View reviewed changes

wu-sheng added the feature New feature label Mar 16, 2026

wu-sheng and others added 2 commits March 17, 2026 15:12

Merge branch 'master' into Support-GenAI-monitoring

37708a2

Merge remote-tracking branch 'origin/prd' into Support-GenAI-monitoring

0de57e7

wu-sheng reviewed Mar 19, 2026

View reviewed changes

.github/workflows/skywalking.yaml Outdated Show resolved Hide resolved

wu-sheng reviewed Mar 19, 2026

View reviewed changes

wu-sheng added 2 commits March 19, 2026 12:17

Merge branch 'master' into Support-GenAI-monitoring

45d255f

Merge branch 'master' into Support-GenAI-monitoring

f15e579

peachisai added 2 commits March 20, 2026 12:33

fix some issues

d2c2165

Merge branches 'Support-GenAI-monitoring' and 'Support-GenAI-monitori…

f417ea5

…ng' of github.com:peachisai/skywalking into Support-GenAI-monitoring

wu-sheng reviewed Mar 20, 2026

View reviewed changes

...c/main/java/org/apache/skywalking/oap/analyzer/genai/service/GenAIModelAccessDispatcher.java Outdated Show resolved Hide resolved

wu-sheng and others added 3 commits March 20, 2026 20:34

Merge branch 'master' into Support-GenAI-monitoring

4f1ea70

fix

ca9704e

Merge branch 'Support-GenAI-monitoring' of github.com:peachisai/skywa…

fab132c

…lking into Support-GenAI-monitoring

Merge branch 'master' into Support-GenAI-monitoring

38af222

wu-sheng reviewed Mar 23, 2026

View reviewed changes

oap-server/server-starter/src/main/resources/gen-ai-config.yml Outdated Show resolved Hide resolved

wu-sheng reviewed Mar 23, 2026

View reviewed changes

oap-server/server-starter/src/main/resources/gen-ai-config.yml Show resolved Hide resolved

wu-sheng reviewed Mar 23, 2026

View reviewed changes

...lyzer/src/main/java/org/apache/skywalking/oap/analyzer/genai/service/GenAIMeterAnalyzer.java Outdated Show resolved Hide resolved

wu-sheng reviewed Mar 23, 2026

View reviewed changes

...n-ai-analyzer/src/main/java/org/apache/skywalking/oap/analyzer/genai/config/GenAITagKey.java Outdated Show resolved Hide resolved

peachisai added 3 commits March 24, 2026 19:32

fix

ea7e330

fix

1f5d5ac

fix

227fef4